Sufficiency is a central concept in that it allows us to focus on the essential aspects of dataset while ignoring irrelevant details.
Statistic, Sufficiency
A statistic is any function of data (not including parameter).
A statistic is sufficient (for model ) if the conditional distribution of is the same for all , i.e. independent of .
In short, sufficient statistics carry all information about .
Example
Back to the coin flipping example. For the three assumptions, we denote respectively
Check for , we have so
This expression is irrelevant to , so is sufficient for model 2.
2 Factorization Theorem
Theorem (Factorization Theorem)
Let be a model with densities with common measure . Then is sufficient iff , with for almost every under .
Proof for Discrete
The proof can also be seen in here. For this note, we anly check the discrete case. WLOG assume is counting measure.
Assume factorization exists. Then we have
does not depend on .
If is sufficient, construct here does not depend on . Then
Another example is orde statistics. For , and any model on . if is invariant to permutation of (see exchangeability), then is sufficient.
3 Minimal Sufficiency
For the example of , we showed that is sufficient. Then is also sufficient.
Some sufficient statistics represent more significant compressions of data than others. Like can be recovered from but not other way around.
We say are equivalent (denote as ) if does not depend on .
For log-likelihood , this implies
For sufficient , if , which does not depend on by sufficiency. So we know that the following relation always holds:
Theorem
is minimal sufficient if .
Proof
First, show is sufficient., we have does not depend on .
Then show is minimal. Assume another sufficient statistic . Suppose , then , so by assumption of the theorem . Assume .
For any other with , we must also have . So .
Q.E.D.
Example (Laplace Location Family)
. Then . The image is a combination of linear segments. So . So is minimal sufficient.
3.1 Minimal Form
Minimal Form
Form of is minimal if , satisfies no linear constraints, i.e. there is no nonzero vector and , s.t.
Otherwise we can represent as an dim exponential form for some .
Proposition
If is a minimal form, then is minimal sufficient.
Proof
By theorem, we only need to show . By this argument, we need to show .
Now, If , we can always find s.t. (this is undesirable, because should be irrelevant of ), so . Q.E.D.
The converse of this proposition is not true.
3.2 Diagram
For case , let's consider the following example:
For , and , they are non-linear, so they are minimal (can't linearly transform to a constant vector).
For , it is not minimal when (because we can always take the normal vector of , then for any point on , we denote it as , we have this contradicts with the definition.)
However, when , is minimal. In this case, take as the new .